Python and MatPlotLib

Python is an alternative scripting language that has become very popular among data analysts. In contrast to R, Python is a general scripting language that has had some interesting statistical and visualization packages developed for it, namely numpy (and the related scipy) and matplotlib. Although matplotlib is much closer to R's base graphics, it does provide a general visualization framework for the Python scripting language. Note that the syntax is very similar to plotting in MatLab.


In [1]:
%pylab inline
# If you are using a new version of ipython, change this to:
# %matplotlib inline


Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

In [2]:
import matplotlib.pyplot as plt

In [3]:
x = linspace(0, 5, 10)
y = x ** 2

fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 1, 1]) # add an axes at the particular location, with the specified height [left, bottom, width, height]

axes.plot(x, y, 'g')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title')


Out[3]:
<matplotlib.text.Text at 0x25bac10>

Here we have defined the xvalues as a series of 10 points from 0 ... 5, and yvalues as the square of x.

Metabolomics Data

Let's read in our metabolomics data set again, and visualize it in a similar way to what we did with ggplot.


In [4]:
import pandas as pd

In [5]:
metaDF = pd.read_csv("metabolomics_reshapedData.csv")

In [6]:
freshTom = metaDF['value'][metaDF['treat'] == "fresh"][metaDF['Species'] == 'tomatillo'][metaDF['value'] <= 5000]
freshPum = metaDF['value'][metaDF['treat'] == "fresh"][metaDF['Species'] == 'pumpkin'][metaDF['value'] <= 5000]

In [7]:
nBin = 100
plt.hist(freshTom, bins=nBin)


Out[7]:
(array([ 15.,  15.,  10.,  16.,   5.,  15.,  19.,  14.,  25.,  19.,  20.,
        14.,  11.,   9.,  20.,  13.,  14.,  11.,  13.,   9.,   7.,   8.,
         9.,  10.,   6.,   8.,   9.,   6.,   7.,   7.,   6.,   5.,   0.,
         5.,   5.,   3.,   5.,   5.,   5.,   3.,   2.,   5.,   1.,   2.,
         2.,   5.,   2.,   2.,   0.,   1.,   2.,   4.,   1.,   3.,   3.,
         4.,   3.,   3.,   2.,   5.,   2.,   2.,   1.,   3.,   2.,   1.,
         2.,   1.,   3.,   1.,   1.,   2.,   2.,   3.,   2.,   0.,   1.,
         1.,   4.,   2.,   1.,   2.,   1.,   1.,   1.,   2.,   4.,   1.,
         2.,   2.,   3.,   2.,   2.,   0.,   3.,   2.,   2.,   2.,   0.,
         5.]),
 array([   13.  ,    62.86,   112.72,   162.58,   212.44,   262.3 ,
         312.16,   362.02,   411.88,   461.74,   511.6 ,   561.46,
         611.32,   661.18,   711.04,   760.9 ,   810.76,   860.62,
         910.48,   960.34,  1010.2 ,  1060.06,  1109.92,  1159.78,
        1209.64,  1259.5 ,  1309.36,  1359.22,  1409.08,  1458.94,
        1508.8 ,  1558.66,  1608.52,  1658.38,  1708.24,  1758.1 ,
        1807.96,  1857.82,  1907.68,  1957.54,  2007.4 ,  2057.26,
        2107.12,  2156.98,  2206.84,  2256.7 ,  2306.56,  2356.42,
        2406.28,  2456.14,  2506.  ,  2555.86,  2605.72,  2655.58,
        2705.44,  2755.3 ,  2805.16,  2855.02,  2904.88,  2954.74,
        3004.6 ,  3054.46,  3104.32,  3154.18,  3204.04,  3253.9 ,
        3303.76,  3353.62,  3403.48,  3453.34,  3503.2 ,  3553.06,
        3602.92,  3652.78,  3702.64,  3752.5 ,  3802.36,  3852.22,
        3902.08,  3951.94,  4001.8 ,  4051.66,  4101.52,  4151.38,
        4201.24,  4251.1 ,  4300.96,  4350.82,  4400.68,  4450.54,
        4500.4 ,  4550.26,  4600.12,  4649.98,  4699.84,  4749.7 ,
        4799.56,  4849.42,  4899.28,  4949.14,  4999.  ]),
 <a list of 100 Patch objects>)

In [8]:
plt.hist(freshPum, bins=nBin)


Out[8]:
(array([ 27.,  32.,  36.,  27.,  20.,  21.,  22.,  13.,  15.,  14.,  13.,
        14.,  19.,  20.,  12.,   5.,  15.,  15.,  10.,  10.,  16.,   5.,
         8.,   7.,   9.,   7.,  10.,   4.,   6.,   4.,   3.,   4.,   7.,
         4.,   6.,   9.,   4.,   4.,   7.,  10.,   6.,   1.,  10.,   5.,
         3.,   4.,   4.,   4.,   1.,   3.,   4.,   7.,   2.,   2.,   3.,
         3.,   2.,   6.,   3.,   5.,   2.,   3.,   0.,   2.,   1.,   2.,
         0.,   2.,   4.,   2.,   0.,   1.,   2.,   0.,   2.,   2.,   0.,
         2.,   2.,   0.,   1.,   4.,   1.,   1.,   2.,   1.,   0.,   0.,
         2.,   1.,   3.,   1.,   4.,   1.,   0.,   0.,   0.,   2.,   1.,
         3.]),
 array([   17.  ,    66.82,   116.64,   166.46,   216.28,   266.1 ,
         315.92,   365.74,   415.56,   465.38,   515.2 ,   565.02,
         614.84,   664.66,   714.48,   764.3 ,   814.12,   863.94,
         913.76,   963.58,  1013.4 ,  1063.22,  1113.04,  1162.86,
        1212.68,  1262.5 ,  1312.32,  1362.14,  1411.96,  1461.78,
        1511.6 ,  1561.42,  1611.24,  1661.06,  1710.88,  1760.7 ,
        1810.52,  1860.34,  1910.16,  1959.98,  2009.8 ,  2059.62,
        2109.44,  2159.26,  2209.08,  2258.9 ,  2308.72,  2358.54,
        2408.36,  2458.18,  2508.  ,  2557.82,  2607.64,  2657.46,
        2707.28,  2757.1 ,  2806.92,  2856.74,  2906.56,  2956.38,
        3006.2 ,  3056.02,  3105.84,  3155.66,  3205.48,  3255.3 ,
        3305.12,  3354.94,  3404.76,  3454.58,  3504.4 ,  3554.22,
        3604.04,  3653.86,  3703.68,  3753.5 ,  3803.32,  3853.14,
        3902.96,  3952.78,  4002.6 ,  4052.42,  4102.24,  4152.06,
        4201.88,  4251.7 ,  4301.52,  4351.34,  4401.16,  4450.98,
        4500.8 ,  4550.62,  4600.44,  4650.26,  4700.08,  4749.9 ,
        4799.72,  4849.54,  4899.36,  4949.18,  4999.  ]),
 <a list of 100 Patch objects>)

In [9]:
lyophTom = metaDF['value'][metaDF['treat'] == "lyoph"][metaDF['Species'] == 'tomatillo'][metaDF['value'] <= 5000]
lyophPum = metaDF['value'][metaDF['treat'] == "lyoph"][metaDF['Species'] == 'pumpkin'][metaDF['value'] <= 5000]

In [10]:
plt.hist(lyophTom, bins=nBin)


Out[10]:
(array([ 12.,  19.,  15.,  25.,  19.,  19.,   9.,  16.,  18.,  19.,  11.,
        19.,  16.,   9.,  10.,  16.,  11.,   8.,  11.,   7.,  10.,   7.,
        13.,   7.,   5.,  10.,   5.,   1.,   6.,   7.,   7.,   5.,   3.,
         5.,   4.,   4.,   4.,   4.,   4.,   5.,   3.,   3.,   4.,   2.,
         4.,   1.,   5.,   5.,   1.,   3.,   1.,   2.,   1.,   3.,   0.,
         2.,   3.,   5.,   4.,   1.,   0.,   1.,   2.,   3.,   3.,   1.,
         0.,   3.,   3.,   1.,   2.,   1.,   0.,   4.,   1.,   1.,   3.,
         3.,   2.,   1.,   1.,   2.,   3.,   4.,   1.,   1.,   1.,   1.,
         0.,   0.,   2.,   3.,   0.,   2.,   2.,   0.,   3.,   2.,   3.,
         2.]),
 array([  2.00000000e+00,   5.19000000e+01,   1.01800000e+02,
         1.51700000e+02,   2.01600000e+02,   2.51500000e+02,
         3.01400000e+02,   3.51300000e+02,   4.01200000e+02,
         4.51100000e+02,   5.01000000e+02,   5.50900000e+02,
         6.00800000e+02,   6.50700000e+02,   7.00600000e+02,
         7.50500000e+02,   8.00400000e+02,   8.50300000e+02,
         9.00200000e+02,   9.50100000e+02,   1.00000000e+03,
         1.04990000e+03,   1.09980000e+03,   1.14970000e+03,
         1.19960000e+03,   1.24950000e+03,   1.29940000e+03,
         1.34930000e+03,   1.39920000e+03,   1.44910000e+03,
         1.49900000e+03,   1.54890000e+03,   1.59880000e+03,
         1.64870000e+03,   1.69860000e+03,   1.74850000e+03,
         1.79840000e+03,   1.84830000e+03,   1.89820000e+03,
         1.94810000e+03,   1.99800000e+03,   2.04790000e+03,
         2.09780000e+03,   2.14770000e+03,   2.19760000e+03,
         2.24750000e+03,   2.29740000e+03,   2.34730000e+03,
         2.39720000e+03,   2.44710000e+03,   2.49700000e+03,
         2.54690000e+03,   2.59680000e+03,   2.64670000e+03,
         2.69660000e+03,   2.74650000e+03,   2.79640000e+03,
         2.84630000e+03,   2.89620000e+03,   2.94610000e+03,
         2.99600000e+03,   3.04590000e+03,   3.09580000e+03,
         3.14570000e+03,   3.19560000e+03,   3.24550000e+03,
         3.29540000e+03,   3.34530000e+03,   3.39520000e+03,
         3.44510000e+03,   3.49500000e+03,   3.54490000e+03,
         3.59480000e+03,   3.64470000e+03,   3.69460000e+03,
         3.74450000e+03,   3.79440000e+03,   3.84430000e+03,
         3.89420000e+03,   3.94410000e+03,   3.99400000e+03,
         4.04390000e+03,   4.09380000e+03,   4.14370000e+03,
         4.19360000e+03,   4.24350000e+03,   4.29340000e+03,
         4.34330000e+03,   4.39320000e+03,   4.44310000e+03,
         4.49300000e+03,   4.54290000e+03,   4.59280000e+03,
         4.64270000e+03,   4.69260000e+03,   4.74250000e+03,
         4.79240000e+03,   4.84230000e+03,   4.89220000e+03,
         4.94210000e+03,   4.99200000e+03]),
 <a list of 100 Patch objects>)

In [11]:
plt.hist(lyophPum, bins=nBin)


Out[11]:
(array([ 22.,  27.,  33.,  15.,  21.,  16.,  17.,  20.,  15.,  13.,  12.,
        12.,  12.,  14.,  17.,   7.,  17.,  17.,   7.,   6.,  15.,  12.,
         8.,   4.,  11.,   6.,   3.,   7.,   9.,   6.,   4.,  10.,   5.,
         8.,   3.,   6.,   6.,   6.,   3.,   8.,   7.,   6.,   4.,   1.,
         3.,   6.,   6.,   2.,   2.,   2.,   4.,   9.,   6.,   9.,   5.,
         4.,   4.,   1.,   3.,   0.,   2.,   3.,   1.,   3.,   4.,   2.,
         4.,   6.,   1.,   1.,   1.,   2.,   0.,   2.,   0.,   1.,   3.,
         1.,   1.,   1.,   2.,   1.,   1.,   3.,   0.,   0.,   0.,   1.,
         0.,   1.,   0.,   1.,   1.,   1.,   2.,   2.,   0.,   1.,   0.,
         2.]),
 array([   13.  ,    62.25,   111.5 ,   160.75,   210.  ,   259.25,
         308.5 ,   357.75,   407.  ,   456.25,   505.5 ,   554.75,
         604.  ,   653.25,   702.5 ,   751.75,   801.  ,   850.25,
         899.5 ,   948.75,   998.  ,  1047.25,  1096.5 ,  1145.75,
        1195.  ,  1244.25,  1293.5 ,  1342.75,  1392.  ,  1441.25,
        1490.5 ,  1539.75,  1589.  ,  1638.25,  1687.5 ,  1736.75,
        1786.  ,  1835.25,  1884.5 ,  1933.75,  1983.  ,  2032.25,
        2081.5 ,  2130.75,  2180.  ,  2229.25,  2278.5 ,  2327.75,
        2377.  ,  2426.25,  2475.5 ,  2524.75,  2574.  ,  2623.25,
        2672.5 ,  2721.75,  2771.  ,  2820.25,  2869.5 ,  2918.75,
        2968.  ,  3017.25,  3066.5 ,  3115.75,  3165.  ,  3214.25,
        3263.5 ,  3312.75,  3362.  ,  3411.25,  3460.5 ,  3509.75,
        3559.  ,  3608.25,  3657.5 ,  3706.75,  3756.  ,  3805.25,
        3854.5 ,  3903.75,  3953.  ,  4002.25,  4051.5 ,  4100.75,
        4150.  ,  4199.25,  4248.5 ,  4297.75,  4347.  ,  4396.25,
        4445.5 ,  4494.75,  4544.  ,  4593.25,  4642.5 ,  4691.75,
        4741.  ,  4790.25,  4839.5 ,  4888.75,  4938.  ]),
 <a list of 100 Patch objects>)

That's great, but is there a way we can have all 4 together?


In [12]:
fig, axes = plt.subplots(nrows=2, ncols=2, sharey=True, sharex=True, squeeze=True)
axes[0,0].hist(lyophPum, bins=nBin)
axes[0,0].set_title("Lyoph Pumpkin")
axes[0,1].hist(lyophTom, bins=nBin)
axes[0,1].set_title("Lyoph Tomatillo")
axes[1,0].hist(freshPum, bins=nBin)
axes[1,0].set_title("Fresh Pumpkin")
axes[1,1].hist(freshTom, bins=nBin)
axes[1,1].set_title("Fresh Tomatillo")

fig.tight_layout()


Can we do any overlap of the histograms as we did in R with ggplot?


In [14]:
bins = linspace(0, 5000, 20)  # set up the bins in advance
plt.hist(lyophPum, bins, alpha=0.5, label='Lyoph Pumpkin')
plt.hist(lyophTom, bins, alpha=0.5, label='Lyoph Tomatillo')
plt.legend()


Out[14]:
<matplotlib.legend.Legend at 0x5b8d110>

Python and ggplot

There is a python port of ggplot that is available, however, it appears to still be very much beta software.


In [15]:
from ggplot import *

In [16]:
ggplot(diamonds, aes('carat', 'price')) + geom_point(alpha=1/20.) + ylim(0, 20000)


Out[16]:
<ggplot: (7177061)>

In [17]:
ggplot(diamonds, aes(x='price', color='cut')) + geom_density()


---------------------------------------------------------------------------
GgplotError                               Traceback (most recent call last)
<ipython-input-17-b86b6956dae9> in <module>()
----> 1 ggplot(diamonds, aes(x='price', color='cut')) + geom_density()

/usr/lib/python2.7/site-packages/IPython/core/displayhook.pyc in __call__(self, result)
    236             self.start_displayhook()
    237             self.write_output_prompt()
--> 238             format_dict = self.compute_format_data(result)
    239             self.write_format_data(format_dict)
    240             self.update_user_ns(result)

/usr/lib/python2.7/site-packages/IPython/core/displayhook.pyc in compute_format_data(self, result)
    148             MIME type representation of the object.
    149         """
--> 150         return self.shell.display_formatter.format(result)
    151 
    152     def write_format_data(self, format_dict):

/usr/lib/python2.7/site-packages/IPython/core/formatters.pyc in format(self, obj, include, exclude)
    124                     continue
    125             try:
--> 126                 data = formatter(obj)
    127             except:
    128                 # FIXME: log the exception

/usr/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    445                 type_pprinters=self.type_printers,
    446                 deferred_pprinters=self.deferred_printers)
--> 447             printer.pretty(obj)
    448             printer.flush()
    449             return stream.getvalue()

/usr/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
    358                             if callable(meth):
    359                                 return meth(obj, self, cycle)
--> 360             return _default_pprint(obj, self, cycle)
    361         finally:
    362             self.end_group()

/usr/lib/python2.7/site-packages/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)
    478     if getattr(klass, '__repr__', None) not in _baseclass_reprs:
    479         # A user-provided repr.
--> 480         p.text(repr(obj))
    481         return
    482     p.begin_group(1, '<')

/usr/lib/python2.7/site-packages/ggplot/ggplot.pyc in __repr__(self)
    108     def __repr__(self):
    109         """Print/show the plot"""
--> 110         figure = self.draw()
    111         # We're going to default to making the plot appear when __repr__ is
    112         # called.

/usr/lib/python2.7/site-packages/ggplot/ggplot.pyc in draw(self)
    303 
    304                     data = self._make_plot_data(data, _aes)
--> 305                     callbacks = geom.plot_layer(data, ax)
    306                     if callbacks:
    307                         for callback in callbacks:

/usr/lib/python2.7/site-packages/ggplot/geoms/geom.pyc in plot_layer(self, data, ax)
    113         _cols = set(data.columns) & set(self.manual_aes)
    114         data = data.drop(_cols, axis=1)
--> 115         data = self._calculate_stats(data)
    116         self._verify_aesthetics(data)
    117         _needed = self.valid_aes | self._extra_requires

/usr/lib/python2.7/site-packages/ggplot/geoms/geom.pyc in _calculate_stats(self, data)
    270             for name, _data in data.groupby(sorted(groups)):
    271                 _data = _data.reindex()
--> 272                 _data = self._stat._calculate(_data)
    273                 new_data = new_data.append(_data, ignore_index=True)
    274         else:

/usr/lib/python2.7/site-packages/ggplot/stats/stat_density.pyc in _calculate(self, data)
     28             except:
     29                 raise GgplotError("stat_density(): aesthetic x mapping " +
---> 30                                 "needs to be convertable to float!")
     31         # TODO: Implement weight
     32         try:

GgplotError: u'stat_density(): aesthetic x mapping needs to be convertable to float!'

Saving Figures

Often we will want to be able to save figures for use in publications.


In [37]:
x = linspace(0, 5, 10)
y = x ** 2

fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 1, 1]) # add an axes at the particular location, with the specified height [left, bottom, width, height]

axes.plot(x, y, 'g')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');

fig.savefig("testFig.png")



In [38]:
x = linspace(0, 5, 10)
y = x ** 2

fig = plt.figure(dpi=400)
axes = fig.add_axes([0.1, 0.1, 1, 1]) # add an axes at the particular location, with the specified height [left, bottom, width, height]

axes.plot(x, y, 'g')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');

fig.savefig("hiRes.png")